config/v1/types_cluster_operator: Expand upgradeable inputs to cluster scope #926

wking · 2021-05-21T17:52:32Z

Consider these cases:

a. Component A is in a state that allows updates, and nothing in the rest of the cluster would break if A updated.
b. Component A is in a state that allows updates, but component B (which is in-cluster, but not part of A) would break if A updated.
c. Component A would break if it updated.

Operator A should pretty clearly be Upgradeable=True for (a) and Upgradeable=False for (c).

Before this commit, a narrow reading of the comment would have operator A be Upgradeable=True for (b). This commit moves it to Upgradeable=False, based on discussion in openshift/enhancements#762, where it becomes the job of the API-server to set Upgradeable=False if updating the API-server would break nodes running old kubelets. The API-server can say "to unblock minor updates, update your kubelets". The machine-config operator will simultaneously say "hey, your kubelets are old, and here's how to update: $STEPS", but it won't use Upgradeable=False to say that (because the machine-config operator would be happy to have its component nodes updated).

As pointed out in discussion in openshift/enhancements#762, this is a bit of a bottomless pit. For example, component A may be removing a deprecated feature on update, and there may be user workloads that occasionally depend on that feature but hardly ever use it. Component A might reasonably think "nobody has used $OUTGOING_FEATURE in the last week, so I'm Upgradeable=True", and then post-update, the user-workload would go to hit the removed API and break. And obviously in-cluster components will have even more limited access to any out-of-cluster components that depend on them. So using Upgradeable=False to protect other components from breaking is going to be a best-effort sort of thing. But this commit pivots so that it's more clear that we'll put that effort in when we can.

…r scope Consider these cases: a. Component A is in a state that allows updates, and nothing in the rest of the cluster would break if A updated. b. Component A is in a state that allows updates, but component B (which is in-cluster, but not part of A) would break if A updated. c. Component A would break if it updated. Operator A should pretty clearly be Upgradeable=True for (a) and Upgradeable=False for (c). Before this commit, a narrow reading of the comment would have operator A be Upgradeable=True for (b). This commit moves it to Upgradeable=False, based on discussion in [1], where it becomes the job of the API-server to set Upgradeable=False if updating the API-server would break nodes running old kubelets. The API-server can say "to unblock minor updates, update your kubelets". The machine-config operator will simultaneously say "hey, your kubelets are old, and here's how to update: $STEPS", but it won't use Upgradeable=False to say that (because the machine-config operator would be _happy_ to have its component nodes updated). As pointed out in discussion in [1], this is a bit of a bottomless pit. For example, component A may be removing a deprecated feature on update, and there may be user workloads that occasionally depend on that feature but hardly ever use it. Component A might reasonably think "nobody has used $OUTGOING_FEATURE in the last week, so I'm Upgradeable=True", and then post-update, the user-workload would go to hit the removed API and break. And obviously in-cluster components will have even more limited access to any out-of-cluster components that depend on them. So using Upgradeable=False to protect other components from breaking is going to be a best-effort sort of thing. But this commit pivots so that it's more clear that we'll put that effort in when we can. [1]: openshift/enhancements#762

wking · 2021-05-21T18:03:18Z

I'm fuzzy on this comment about the API-server being a client of the kubelet for exec flows. Perhaps that is sufficient to get the kubelet-skew-guard in under (c)? And maybe we want the godocs to be generic enough that operator B could say "hey, A is going to fast, wait for me to catch up" would be possible for cases where A isn't smart enough to notice B falling behind? Wording would be something like:

Upgradeable indicates whether the operator considers the OpenShift core safe to upgrade based on the current cluster state.

soltysh

/lgtm
/approve

openshift-ci · 2021-06-02T10:51:39Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: soltysh, wking

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details

Needs approval from an approver in each of these files:

~~OWNERS~~ [soltysh]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

wking force-pushed the expand-upgradeable-scope-to-cluster-state branch from d7dc014 to 1a82848 Compare May 21, 2021 17:52

openshift-ci bot requested review from mfojtik and soltysh May 21, 2021 17:52

wking mentioned this pull request May 21, 2021

eus-upgrades-mvp: don't enforce skew check in MCO openshift/enhancements#762

Merged

soltysh approved these changes Jun 2, 2021

View reviewed changes

openshift-ci bot assigned soltysh Jun 2, 2021

openshift-ci bot added the lgtm Indicates that a PR is ready to be merged. label Jun 2, 2021

openshift-ci bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jun 2, 2021

openshift-merge-robot merged commit 2deea64 into openshift:master Jun 14, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

config/v1/types_cluster_operator: Expand upgradeable inputs to cluster scope #926

config/v1/types_cluster_operator: Expand upgradeable inputs to cluster scope #926

Uh oh!

wking commented May 21, 2021

Uh oh!

wking commented May 21, 2021

Uh oh!

soltysh left a comment

Uh oh!

openshift-ci bot commented Jun 2, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

config/v1/types_cluster_operator: Expand upgradeable inputs to cluster scope #926

config/v1/types_cluster_operator: Expand upgradeable inputs to cluster scope #926

Uh oh!

Conversation

wking commented May 21, 2021

Uh oh!

wking commented May 21, 2021

Uh oh!

soltysh left a comment

Choose a reason for hiding this comment

Uh oh!

openshift-ci bot commented Jun 2, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants